VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

نویسندگان

چکیده

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs. However, trade-off between quantization bitwidth and final accuracy is complex non-convex, which makes it difficult optimized directly. Minimizing direct loss (DQL) coefficient data local optimization method, but previous works often neglect accurate control DQL, resulting in a higher DNN model accuracy. In this paper, we propose novel metric, called Vector Loss. Using new decompose minimization DQL two independent processes, significantly outperform traditional iterative L2 process terms effectiveness, as well We also develop solution VecQ, provides minimal achieve order speed up proposed during training, accelerate with parameterized probability estimation template-based derivation calculation. evaluate our algorithm on MNIST, CIFAR, ImageNet, IMDB movie review THUCNews text sets numerical models. The results demonstrate that more than state-of-the-art approaches yet flexible support. Moreover, evaluation quantized models Salient Object Detection (SOD) tasks maintains comparable feature extraction quality 16× weight size reduction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantization-based language model compression

This paper describes two techniques for reducing the size of statistical back-off -gram language models in computer memory. Language model compression is achieved through a combination of quantizing language model probabilities and back-off weights and the pruning of parameters that are determined to be unnecessary after quantization. The recognition performance of the original and compressed l...

متن کامل

Loss-aware Weight Quantization of Deep Networks

The huge size of deep networks hinders their use in small computing devices. In this paper, we consider compressing the network by weight quantization. We extend a recently proposed loss-aware weight binarization scheme to ternarization, with possibly different scaling parameters for the positive and negative weights, and m-bit (where m > 2) quantization. Experiments on feedforward and recurren...

متن کامل

Model compression via distillation and quantization

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep models in resource-constrained environments, such as mobile or embedded devices. This paper focuses on this problem, and proposes two new compression methods, wh...

متن کامل

Vector Quantization for Image Compression-Concept Quantization

متن کامل

Medical Image Compression Using Vector Quantization and Gaussian Mixture Model

Codebook design for vector quantization could be performed using clustering technique. The Gaussian Mixture Modeling (GMM) clustering algorithm involves modeling a statistical distribution by a mixture (or weighted sum) of other distributions. GMM has proven superior efficiency in both time and accuracy and has been used with vector quantization in some applications. This paper introduces a med...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Computers

سال: 2021

ISSN: ['1557-9956', '2326-3814', '0018-9340']

DOI: https://doi.org/10.1109/tc.2020.2995593